267 research outputs found
Exact Hypothesis Tests for Log-linear Models with exactLoglinTest
This manuscript overviews exact testing of goodness of fit for log-linear models using the R package exactLoglinTest. This package evaluates model fit for Poisson log-linear models by conditioning on minimal sufficient statistics to remove nuisance parameters. A Monte Carlo algorithm is proposed to estimate P values from the resulting conditional distribution. In particular, this package implements a sequentially rounded normal approximation and importance sampling to approximate probabilities from the conditional distribution. Usually, this results in a high percentage of valid samples. However, in instances where this is not the case, a Metropolis Hastings algorithm can be implemented that makes more localized jumps within the reference set. The manuscript details how some conditional tests for binomial logit models can also be viewed as conditional Poisson log-linear models and hence can be performed via exactLoglinTest. A diverse battery of examples is considered to highlight use, features and extensions of the software. Notably, potential extensions to evaluating disclosure risk are also considered.
Fast, Exact Bootstrap Principal Component Analysis for p>1 million
Many have suggested a bootstrap procedure for estimating the sampling
variability of principal component analysis (PCA) results. However, when the
number of measurements per subject () is much larger than the number of
subjects (), the challenge of calculating and storing the leading principal
components from each bootstrap sample can be computationally infeasible. To
address this, we outline methods for fast, exact calculation of bootstrap
principal components, eigenvalues, and scores. Our methods leverage the fact
that all bootstrap samples occupy the same -dimensional subspace as the
original sample. As a result, all bootstrap principal components are limited to
the same -dimensional subspace and can be efficiently represented by their
low dimensional coordinates in that subspace. Several uncertainty metrics can
be computed solely based on the bootstrap distribution of these low dimensional
coordinates, without calculating or storing the -dimensional bootstrap
components. Fast bootstrap PCA is applied to a dataset of sleep
electroencephalogram (EEG) recordings (, ), and to a dataset of
brain magnetic resonance images (MRIs) ( 3 million, ). For the
brain MRI dataset, our method allows for standard errors for the first 3
principal components based on 1000 bootstrap samples to be calculated on a
standard laptop in 47 minutes, as opposed to approximately 4 days with standard
methods.Comment: 25 pages, including 9 figures and link to R package. 2014-05-14
update: final formatting edits for journal submission, condensed figure
Sparse Median Graphs Estimation in a High Dimensional Semiparametric Model
In this manuscript a unified framework for conducting inference on complex
aggregated data in high dimensional settings is proposed. The data are assumed
to be a collection of multiple non-Gaussian realizations with underlying
undirected graphical structures. Utilizing the concept of median graphs in
summarizing the commonality across these graphical structures, a novel
semiparametric approach to modeling such complex aggregated data is provided
along with robust estimation of the median graph, which is assumed to be
sparse. The estimator is proved to be consistent in graph recovery and an upper
bound on the rate of convergence is given. Experiments on both synthetic and
real datasets are conducted to illustrate the empirical usefulness of the
proposed models and methods
Fixed-width output analysis for Markov chain Monte Carlo
Markov chain Monte Carlo is a method of producing a correlated sample in
order to estimate features of a target distribution via ergodic averages. A
fundamental question is when should sampling stop? That is, when are the
ergodic averages good estimates of the desired quantities? We consider a method
that stops the simulation when the width of a confidence interval based on an
ergodic average is less than a user-specified value. Hence calculating a Monte
Carlo standard error is a critical step in assessing the simulation output. We
consider the regenerative simulation and batch means methods of estimating the
variance of the asymptotic normal distribution. We give sufficient conditions
for the strong consistency of both methods and investigate their finite sample
properties in a variety of examples
Joint Estimation of Multiple Graphical Models from High Dimensional Time Series
In this manuscript we consider the problem of jointly estimating multiple
graphical models in high dimensions. We assume that the data are collected from
n subjects, each of which consists of T possibly dependent observations. The
graphical models of subjects vary, but are assumed to change smoothly
corresponding to a measure of closeness between subjects. We propose a kernel
based method for jointly estimating all graphical models. Theoretically, under
a double asymptotic framework, where both (T,n) and the dimension d can
increase, we provide the explicit rate of convergence in parameter estimation.
It characterizes the strength one can borrow across different individuals and
impact of data dependence on parameter estimation. Empirically, experiments on
both synthetic and real resting state functional magnetic resonance imaging
(rs-fMRI) data illustrate the effectiveness of the proposed method.Comment: 40 page
A User-Friendly Introduction to Link-Probit-Normal Models
Probit-normal models have attractive properties compared to logit-normal models. In particular, they allow for easy specification of marginal links of interest while permitting a conditional random effects structure. Moreover, programming fitting algorithms for probit-normal models can be trivial with the use of well-developed algorithms for approximating multivariate normal quantiles. In typical settings, the data cannot distinguish between probit and logit conditional link functions. Therefore, if marginal interpretations are desired, the default conditional link should be the most convenient one. We refer to models with a probit conditional link an arbitrary marginal link and a normal random effect distribution as link-probit-normal models. In this manuscript we outline these models and discuss appropriate situations for using multivariate normal approximations. Unlike other manuscripts in this area that focus on very general situations and implement Markov chain or MCEM algorithms, we focus on simpler, random intercept settings and give a collection of user-friendly examples and reproducible code. Marginally, the link-probit-normal model is obtained by a non-linear model on a discretized multivariate normal distribution, and thus can be thought of as a special case of discretizing a multivariate T distribution (as the degrees of freedom go to infinity). We also consider the larger class of multivariate T marginal models and illustrate how these models can be used to closely approximate a logit link
A NOVEL AND SIMPLE RULE OF THUMB FOR MULTIPLICITY CONTROL IN EQUIVALENCE TESTING USING TWO ONE-SIDED TESTS
Equivalence testing is growing in use in scientific research outside of its traditional role in the drug approval process. Largely due to its ease of use and recommendation from the United States Food and Drug Administration guidance, the most common statistical method for testing (bio)equivalence is the two one-sided tests procedure (TOST). Like classical point-null hypothesis testing, TOST is subject to multiplicity concerns as more comparisons are made. In this manuscript, a condition that bounds the family-wise error rate (FWER) using TOST is given. This condition then leads to a simple solution for controlling the FWER. Specifically, we demonstrate that if all pairwise comparisons of k independent groups are being evaluated for equivalence, then simply scaling the nominal Type I error rate down by (k - 1) is sufficient to maintain the family-wise error rate at the desired value or less. The resulting rule is much less conservative than the equally simple Bonferroni correction. An example of equivalence testing in a non drug-development setting is given
- …